Considerations for Elasticsearch Dynamic Field Mapping
TLDR
- Explicit Mapping is strongly recommended for production environments to ensure performance and storage efficiency.
- Dynamic Mapping causes string fields to generate dual indexes for
textandkeyword, leading to wasted storage space. - Special features (such as geolocation, nested objects, and custom analyzers) cannot be automatically enabled via dynamic mapping and must be defined manually.
- Over-reliance on dynamic mapping can trigger a "Mapping Explosion," causing the number of index fields to exceed the limit (default is 1000).
- It is recommended to set the
dynamicparameter tofalseorstrictto prevent the index structure from becoming uncontrollable.
Misconceptions About Dynamic Field Mapping
In Elasticsearch, while Dynamic Mapping provides convenience during the initial development phase, it hides several risks in production environments.
1. String Types Lead to Storage Bloat
When you might encounter this issue: When the database automatically infers string field types.
Elasticsearch stores strings as both text and keyword types by default. text is used for full-text search, while keyword is used for exact matching and aggregations. This dual-indexing mechanism leads to a significant increase in storage space. Unless necessary, you should explicitly define field types to save space.
2. Special Features Cannot Be Automatically Enabled
When you might encounter this issue: When you need to use geolocation, nested structures, or custom analyzers.
Dynamic mapping can only handle basic types and cannot recognize specific requirements:
- Geolocation: If not pre-defined as
geo_pointorgeo_shape, the system treats it as a standardobject, making it impossible to use geolocation query APIs. - Nested Objects: Dynamic mapping treats
nestedobjects as flattenedobjecttypes, causing objects within arrays to be queried incorrectly. - Custom Analyzers: Dynamic mapping always uses the default
standard analyzer, making it impossible to apply Chinese word segmentation or synonym processing.
3. The Risk of Mapping Explosion
When you might encounter this issue: When the data source contains a large number of non-fixed field names (such as user-defined fields).
If there are too many fields in an index, it causes a surge in memory consumption. Elasticsearch limits each index to a maximum of 1000 fields by default; once this limit is exceeded, the system will reject new document writes.
Dynamic Mapping Type Inference Rules
The rules by which Elasticsearch automatically infers types based on data content are as follows:
| JSON Data Type | Elasticsearch Type ("dynamic":"true") | Elasticsearch Type ("dynamic":"runtime") |
|---|---|---|
null | No field added | No field added |
true or false | boolean | boolean |
double | float | double |
long | long | long |
object | object | No field added |
array | Depends on the first non-null value in the array | Depends on the first non-null value in the array |
string passing date detection | date | date |
string passing numeric detection | float or long | double or long |
string failing date or numeric detection | text with a .keyword sub-field | keyword |
Dynamic Parameter Configuration Options
To control the index structure, it is recommended to adjust the dynamic parameter based on your scenario:
true(default): New fields are automatically added to the mapping. Suitable for the development phase; not recommended for production environments.runtime: New fields exist as runtime fields; they are not indexed and are calculated on the fly during queries. Suitable for fields that are not frequently queried, saving storage space but resulting in poorer query performance.false: Ignores new fields. Data will still appear in_source, but it cannot be searched or indexed. This effectively prevents Mapping Explosion.strict: Throws an exception and rejects writes immediately when a new field is detected. This is the strictest control method, suitable for production environments with high structural requirements.
Conclusion
While dynamic mapping is convenient, it is recommended to plan your Schema in advance and use Explicit Mapping in production environments. This ensures an optimal balance between storage space, query performance, and functional requirements, while avoiding the high costs of reindexing necessitated by structural changes later on.
Change Log
- 2025-10-04 Initial document creation.
